Support encrypted VACUUM INTO destinations via file URIs#6107
Support encrypted VACUUM INTO destinations via file URIs#6107tjthiagocosta wants to merge 4 commits intotursodatabase:mainfrom
Conversation
Extract the duplicated cipher/hexkey validation logic from Connection::from_uri and from_uri_attached into a shared OpenOptions::get_encryption_opts() method. Replace the catch-all `_ => OpenFlags::default()` in get_flags() with explicit arms for ReadWrite and Memory, making the match exhaustive over all OpenMode variants.
Teach VACUUM INTO to accept file: URIs as the destination path, enabling mode=, modeof=, cipher=, and hexkey= query parameters. This allows encrypting, re-encrypting, or decrypting a database in a single VACUUM INTO statement via URI parameters.
|
@avinassh probably wants to have a look at this too |
There was a problem hiding this comment.
Pull request overview
This PR extends VACUUM INTO to accept file: URI destinations with cipher and hexkey query parameters, enabling encrypt/re-encrypt/decrypt workflows while also tightening SQLite URI open-mode flag handling and centralizing encryption-option parsing.
Changes:
- Add URI target handling to
VACUUM INTO, including encryption parameter support (cipher,hexkey) and open-mode parity behaviors (includingmode=memory). - Centralize URI encryption-option validation via
OpenOptions::get_encryption_opts()and makeget_flags()explicitly map allOpenModevariants. - Extend dbhash tooling/tests to open and hash encrypted databases for parity assertions.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/dbhash/src/lib.rs | Adds hash_database_with_encryption() to open/hash encrypted DBs with provided opts. |
| tests/integration/query_processing/test_vacuum.rs | Adds integration coverage for URI targets (mode parity, memory mode) and encrypted VACUUM INTO workflows. |
| core/vdbe/execute.rs | Implements VACUUM INTO URI classification + destination opening logic + destination encryption setup. |
| core/util.rs | Updates URI mode→flag mapping and adds OpenOptions::get_encryption_opts() validation helper + tests. |
| core/lib.rs | Adds Database::prepare_new_database_encryption() to preconfigure encryption for new DB creation paths. |
| core/connection.rs | Reuses OpenOptions::get_encryption_opts() in URI open paths (deduplicates validation logic). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Map SQLite URI modes to the corresponding open flags. | ||
| Ok(match self.mode { | ||
| OpenMode::ReadWriteCreate => OpenFlags::Create, | ||
| OpenMode::ReadOnly => OpenFlags::ReadOnly, | ||
| _ => OpenFlags::default(), | ||
| OpenMode::ReadWrite => OpenFlags::None, | ||
| OpenMode::Memory => OpenFlags::Create, | ||
| }) |
There was a problem hiding this comment.
get_flags() now maps mode=rw to OpenFlags::None. When the parsed URI path is :memory: (e.g. file::memory:?mode=rw), Connection::from_uri() will still take the in-memory branch but then pass OpenFlags::None into MemoryIO::open_file, which errors with NotFound because Create is not set. Consider special-casing self.path == MEMORY_PATH inside get_flags() (or in the from_uri memory-path branch) so :memory: targets always use OpenFlags::Create regardless of mode, to avoid this regression and keep file::memory: behavior consistent.
There was a problem hiding this comment.
Good catch! file::memory:?mode=rw would now regress because the in-memory branch gets OpenFlags::None. I’ll special-case MEMORY_PATH in get_flags() and add a regression test for that URI.
| let source_uri = format!( | ||
| "file:{dest_path_str}?cipher={}&hexkey={}", | ||
| source_encryption_opts.cipher, source_encryption_opts.hexkey | ||
| ); |
There was a problem hiding this comment.
The source_uri variable is constructed using dest_path_str and points at the destination DB (with the source encryption params). Renaming it to something like dest_uri_with_source_params would make the intent clearer and avoid confusion when reading the test.
| assert_eq!(rows, vec![("ok".to_string(),)]); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
what is this test supposed to check?
There was a problem hiding this comment.
It is a regression test for the URI open path, specifically to verify that \file::memory:?mode=memory still takes the in-memory branch and behaves like an in-memory database after the URI flag refactor. I’ll keep it for that coverage unless you’d prefer it removed from this PR.
| .exists() | ||
| .then(|| std::fs::read(&turso_dest)) | ||
| .transpose()?; | ||
| let turso_literal = sqlite_vacuum_literal(&turso_dest, case.target); |
There was a problem hiding this comment.
this is the destination path right, why it is called turso_literal
There was a problem hiding this comment.
This variable holds the target string passed to VACUUM INTO '...' (either a plain path or a file URI)
I'll rename it to vacuum_target.
| ExistingDb, | ||
| } | ||
|
|
||
| fn sqlite_vacuum_literal(path: &Path, target: VacuumIntoTargetLiteral) -> String { |
| let source_hash = hash_database_path_with_encryption(&turso_source, None)?; | ||
| let dest_hash = hash_database_path_with_encryption(&turso_dest, None)?; |
There was a problem hiding this comment.
I’ll simplify those calls.
| } | ||
|
|
||
| Ok(()) | ||
| } |
| } | ||
|
|
||
| #[turso_macros::test] | ||
| fn test_vacuum_into_uri_encrypts_plain_destination(tmp_db: TempDatabase) -> anyhow::Result<()> { |
There was a problem hiding this comment.
‘plain’ refers to the source database being plaintext in that test. The current name is misleading, I’ll rename it to test_vacuum_into_uri_encrypts_plaintext_source
| "file:{dest_path_str}?cipher={}&hexkey={}", | ||
| dest_encryption_opts.cipher, dest_encryption_opts.hexkey | ||
| ); | ||
| let source_uri = format!( | ||
| "file:{dest_path_str}?cipher={}&hexkey={}", |
There was a problem hiding this comment.
I think what you want is dest_uri_with_source_encryption_params
There was a problem hiding this comment.
Both URIs point to the same destination file. The second URI reopens that destination with the source encryption params to verify the vacuumed output no longer opens with the old key/cipher.
dest_uri_with_source_encryption_params is a clearer name. I’ll rename it accordingly.
| text.replace('\'', "''") | ||
| } | ||
|
|
||
| fn vacuum_into_literal(conn: &Arc<Connection>, dest_literal: &str) -> anyhow::Result<()> { |
There was a problem hiding this comment.
It refers to "string literal", but after review, the helper isn't really pulling its weight;
it only does two things: wraps escape_sqlite_string_literal(...) and calls conn.execute(...). It has 4 call sites, so it's better to remove it and inline the calls
There was a problem hiding this comment.
yeah, I suggest you to go over the names / fns and rename them if needed
The get_flags refactor mapped OpenMode::ReadWrite to OpenFlags::None, which broke file::memory:?mode=rw because MemoryIO::open_file requires the Create flag. Force Create in the from_uri memory branch since in-memory databases always need creation regardless of mode.
…pers Add a regression test for file::memory:?mode=rw and simplify the VACUUM URI test helpers and names so the URI mode coverage is easier to follow.
7def7d5 to
2184a31
Compare
| if uri_opts.path == MEMORY_PATH || matches!(uri_opts.mode, OpenMode::Memory) { | ||
| let io: Arc<dyn crate::IO> = Arc::new(crate::MemoryIO::new()); | ||
| return crate::Database::open_file_with_flags(io, MEMORY_PATH, flags, dest_opts, None); | ||
| } |
There was a problem hiding this comment.
I guess this allows vacuum into with :memory:, so lets add a test which verifies that it works as expected
| if metadata.is_file() && metadata.len() == 0 && !flags.contains(OpenFlags::ReadOnly) { | ||
| std::fs::remove_file(&uri_opts.path) | ||
| .map_err(|e| crate::error::io_error(e, "remove_file"))?; |
There was a problem hiding this comment.
i dont think we should ever remove / delete files. instead, raise a proper error and ask user to delete the file
| ))); | ||
| } | ||
|
|
||
| // make sure to create destination database with same experimental features as source | ||
| // Always use PlatformIO for the destination file, even if source is in-memory. | ||
| // This ensures VACUUM INTO actually writes to disk. | ||
| let io: Arc<dyn crate::IO> = Arc::new(crate::io::PlatformIO::new()?); | ||
| // Plain filename destinations keep using PlatformIO so VACUUM INTO writes to disk | ||
| // even when the source database is in-memory. | ||
| let blocking_io: Arc<dyn crate::IO> = Arc::new(crate::io::PlatformIO::new()?); |
There was a problem hiding this comment.
why the comment edited?
and also why it is renamed to blocking_io?
| let preserve_source_reserved_space = | ||
| program.connection.get_encryption_cipher_mode().is_none() | ||
| && dest_encryption_opts.is_none(); |
There was a problem hiding this comment.
isn;t this confusing? it reads like we are preserving reserved space if encryption is not enabled
| "PRAGMA hexkey = '{}'", | ||
| escape_sql_string_literal(&dest_encryption_opts.hexkey) | ||
| ))?; | ||
| dest_conn.set_reserved_bytes(cipher_mode.metadata_size() as u8)?; |
There was a problem hiding this comment.
use required_reserved_bytes fn even though it is same, for readability
| let perms = std::fs::metadata(modeof).map_err(|e| crate::error::io_error(e, "metadata"))?; | ||
| std::fs::set_permissions(&uri_opts.path, perms.permissions()) | ||
| .map_err(|e| crate::error::io_error(e, "set_permissions"))?; | ||
| } |
There was a problem hiding this comment.
i'd verify if this is actually required... because when we open the connection, i thought we set these perms too
Description
Allow
VACUUM INTOdestinations to be specified asfile:URIs withcipherandhexkeyquery parameters, enabling encrypt, re-encrypt,and decrypt workflows in a single statement.
Also fixes URI-target edge cases that surfaced during implementation:
explicit
get_flags()matching for allOpenModevariants, andcentralized encryption-option parsing via
OpenOptions::get_encryption_opts().Test plan
mode=memoryURI targetsAI usage
GPT-5.4 and Claude Opus 4.6 assisted with implementation, review, and testing.
Fixes #5852